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Notation 


(a«)?=o:  denominator  polynomial  coefficients  (a0  =1) 

A(z ):  the  polynomial  formed  by 
(c«)"=o:  the  numerator  polynomial  coefficients 
C(z ):  the  numerator  polynomial 
(bi )"i_nc:  the  “numerator  square”  polynomial  coefficients 
:  Note  that  6_t  =  6, 

B(z):  the  numerator  square  polynomial:  C(z)C(z_I) 

(r,),:  the  autocorrelation  coefficients 

(l/i),:  the  time  series  measurements 

(f,),:  the  standard  unbiased  estimates  of  (r,)< 

(«.■)&:  the  Yule  Walker  estimates  of  (a,)-‘J1  based  on  (r,), 

(b>)i=o:  the  estimates  of  based  on  (fj)j  and  (a,)J**j 

(a,)j:  the  coefficients  of  z1  in  the  long  division  of  — 1 

(/?,)<:  the  coefficients  of  B(z)  ■  B(z) 


IX 


1.  Introduction 


There  are  various  areas  in  engineering  where  the  problem  of  finding  a  compact 
representation  of  a  signal  arises.  Reasons  for  using  such  a  representation  include:  the 
representation  is  directly  related  to  the  information  in  the  signal,  the  representation 
can  be  transmitted  over  channels  with  a  smaller  bandwidth1  than  would  be  possible 
for  the  original  signal,  and  the  representation  has  noise  immunity  properties. 

One  representation  that  is  used  very  widely  is  the  ARMA  model.  ARMA  is  an 
abbreviation  of  AutoRegressive  Moving  Average,  meaning  that  the  signal’s  value 
at  some  point  in  time  is  both  a  function  of  the  past  values  of  the  signal  (  AutoRe¬ 
gressive  )  and  of  the  present  and  past  values  of  an  i.i.d.  white  noise  signal  (  Moving 
Average  ).  That  is,  we  can  model  the  signal  as: 

na  nc 

Vk  =  ~YjaiVk~i  + 

i= 1  j—0 

where  (yk)k  is  the  signal  of  interest,  and  (e*)*  is  a  white  noise  signal.  The  reasons 
the  ARMA  model  is  popular  are:  that  it  is  relatively  easy  to  compute  from  the 
signal  (y*;)fc,  and  that  the  a,  and  c;  coefficients  relate  directly  to  the  shape  of  the 
spectrum  of  the  signal. 

Another  way  of  looking  at  the  ARMA  model  results  from  its  frequency  domain 
interpretation:  We  are  modelling  the  signal  (yk)k  as  the  output  of  a  linear  filter 
driven  by  white  noise  (  see  Figure  1  ).  Since  white  noise  is  spectrally  fiat  by 
definition,  the  spectral  shape  of  the  signal  is  completely  determined  by  the  filter. 

'This  includes  memory  storage,  which  can  be  seen  as  a  transmission  through  time  rather  than 


space. 


White  Noise 


Filter 


Our  Signal 


Figure  1.1:  Signal  Model 

As  an  example  application,  consider  the  transmission  of  speech  over  a  narrow- 
band  digital  channel  (  the  situation  most  western  telephone  companies  want  to 
convert  to  ).  One  way  would  be  to  simply  digitize  the  speech  signal  and  transmit 
the  digitized  version;  however,  given  that  reliable  human  recognition  of  the  speech 
signal  requires  that  the  spectrum  of  the  signal  at  the  receiving  end  coincides  with 
the  original  spectrum  for  at  least  several  kiloHertz,  and  that  we  need  a  reasonably 
fine  quantization  for  the  same  reason,  it  is  clear  that  we  need  a  channel  capacity  of 
many  kilobaud  with  this  method.  On  the  other  hand,  we  could  compress  the  speech, 
using  ARMA  modeling  techniques,  transmit  the  compressed  signal,  and  reconstruct 
the  speech  from  the  compressed  signal  at  the  receiver.  Transmission  equipment  is 
available  that  transmits  speech  understandably  in  this  way  while  using  as  little  as 
4  kilobaud  [10].  Since  state-of-the-art  is  not  yet  good  enough  to  allow  the  person  at 
the  receiving  end  to  recognize  the  speaker,  this  method  is  not  yet  suitable  for  general 
use;  however,  applications  in  verification/identification  of  authorized  personel  and 
in  voice  command  of  machinery  exist  [15]. 


ARMA  modeling  techniques  axe  used  in  various  other  areas:  in  communications, 
including  adaptive  matched  filters  in  transmission  channels  see  [1],  and  in  geologic 
surveying  by  means  of  acoustical  procedures  (  used  e.g.  to  find  oil  and  gas  ):  see 
[2]  and  [13].  See  [12]  for  more  applications. 

There  are  various  ways  of  estimating  the  coefficients  a,  and  c,,  but  they  can 
generally  be  divided  into  two  groups:  computationally  simple  but  not  of  very  high 
statistical  quality  (  i.e.  the  variance  of  the  estimated  coefficients  is  higher  than 
necessary  ),  or  computationally  expensive  but  of  high  statistical  quality.  This  report 
is  concerned  with  a  compromise  algorithm:  on  the  one  hand  it  is  computationally 
efficient,  but  on  the  other  hand  it  asymptotically  (  i.e.  for  the  data  length  of  the 
available  signal  growing  to  infinity  )  achieves  the  maximum  statistical  quality.  The 
report  studies  the  behaviour  of  the  algorithm  for  finite  data  lengths. 

The  subject  algorithm  has  been  formulated  in  [16]  and  its  computational  as¬ 
pects  have  been  addressed  in  [14].  It  is  known  [16]  that  the  algorithm  approaches 
the  Cramer-Rao  Lower  Bound  (  CRLB  )  as  the  number  of  measured  data  points 
N  (  i.e.  we  have  measurements  of  j/o'-’J/n  for  some  N  )  grows  to  infinite  (  i.e. 
N  — *  oo  ).  Under  these  asymptotic  conditions  it  has  also  been  shown  that  the 
algorithm  is  numerically  well-conditioned  if  the  polynomial  C  does  not  have  roots 
with  approximately  unit  magnitude.  The  algorithm  as  formulated  in  [16]  and  [14] 
leaves  two  freedoms  of  choice:  the  number  of  instrumental  variables  to  be  used 
and  the  number  of  postiterations  (  both  terms  will  be  clarified  in  the  next  chap¬ 
ter  ).  Asymptotic  variance  expressions  as  a  function  of  each  of  these  parameters 


In  this  report  we  investigate  what  effects  a  finite  data  length  has  on  the  prop¬ 
erties  of  the  algorithm.  Numerical  problems  (  ill-conditioning  )  for  finite  N,  and 
the  behaviour  of  the  algorithm  as  a  function  of  the  number  of  postiterations  and  as 
a  function  of  the  number  of  instrumental  variables  are  studied,  and  are  compared 
with  the  asymptotic  expressions.  We  will  find  that  several  problems  do  exist  and 
we  will  develop  several  variations  on  the  algorithm  that  attempt  to  reduce  these 
problems. 


2.  Background 


This  chapter  is  a  short  summary  of  the  relevant  material  from  [16]  and  [14];  it 
formulates  the  spectral  model  as  well  as  the  estimator  and  briefly  discusses  them. 


2.1  The  ARMA  Spectral  Model 


As  stated  in  the  introduction,  we  want  to  estimate  the  spectrum  of  a  signal 
(Vk)k-  The  general  formulation  of  a  spectral  estimator  would  be: 

1.  parameterize  the  spectral  density  function  (  choose  a  model  ); 


2.  estimate  the  parameters  of  that  model;  and 


3.  compute  the  estimated  spectral  density  function  from  these  estimated  param¬ 
eters. 


The  parameterization  we  will  use  is  the  ARMA  spectral  model  defined  below. 
Consider  the  ARMA  spectral  model 


M<1  )y(t)  =  C{q  l)e(t) 


where 


e(t)  =  white  Gaussian  noise  with  zero  mean  and  variance  A2  (2.2) 
<7-1  =  unit  delay  operator  (  g-1y(<)  =  y{t  —  1)  )•  (2.3) 

y  and  e  are  scalars 


and  the  following  standard  assumptions  are  made: 


Assumption  2.1 


A(z)  •  C(z)  =  0  =>  |2|  >  1 


Assumption  2.2 


an4.c„ ^  0  and  (A(2),C(2)}  are  coprime  polynomials 

Together,  this  means  the  ARMA  representation  is  minimal,  stable  and  invertible. 
The  algorithm  studied  here  does  not  extend  to  cases  where  these  assumptions  do 
not  hold,  e.g.  the  case  of  sinusoids  in  white  noise  cannot  be  modelled  in  this  way, 
as  it  violates  Assumption  2.1.  However,  many  practical  cases  do  fall  in  the  category 
described  by  Assumptions  2.1  and  2.2. 

Note  also  that  we  need  to  know  the  orders  na  and  nc  of  the  polynomials  A  and 
C;  here  we  will  assume  that  these  are  given. 

Now  we  can  introduce: 

B(z)  =  C(z-')C(z)  (2.4) 

Under  Assumption  2.1,  C(z)  can  be  uniquely  determined  from  B(z),  and,  as  we 
shall  see  below,  B(z)  is  easier  to  estimate.  We  also  define 

rk  =  E{y(t)y(t  —  fc)}  =  the  covariance  of  y(t)  at  lag  k  (2.5) 

OO 

$(z)  =  ^2  rkz~k  =  the  spectral  density  of  y(t)  (2.6) 

k—  —  oo 

where  £{■}  denotes  the  expectation  operator  and  z  is  a  complex  variable. 

It  is  well  known  that 

,C(z)C(z~') 


$(2)  =  V 


(2.7) 


A(z)A(z~'1) 

The  problem  considered  here  is  to  estimate  $(2)  from  N  measurements  of  (yk)k- 
We  intend  to  carry  out  the  estimation  by  parameterizing  the  spectrum  and  then 


estimating  the  parameters  from  the  available  data.  There  axe  several  different  pos¬ 
sible  methods  of  parameterizing  the  spectrum.  Specifically,  as  [16]  and  [14]  suggest, 
one  can  parameterize  $  by 


{A  ,  ctj ,  *  •  • ,  dna,  Cj,  *  •  • ,  cnc} 

(2.8) 

or  by 

{r0,  y  ^na+nc} 

(2.9) 

or  by 

{^0,  *  *  *  )  1*ncy 

(2.10) 

We  parameterize  on  (2.9).  Finding  the  a  parameters  from  the  r  parameters  can  be 
done  by  solving  the  well-known  Yule  Walker  equation  [3]: 


•  ^nc+l- 


,  K  >  na  — f  nc 


(2.11) 


rK- na 


The  b  parameters  can  then  be  found  using  the  well-known  equality  [11]: 


bk  =  [a0  •  •  •  Qr 


(2.12) 


[  r|*-»«!  •••  j  [_  ano  j 

Thus  we  can  uniquely  convert  from  any  of  the  parametrizations  (2.8)  -  (2.10)  to  any 
other,  i.e.  they  are  mathematically  equivalent.  However,  as  [10]  discusses,  there  are 
good  reasons  for  using  the  second  parameterization  above;  therefore  we  define: 


®  —  [r0  '  '  '  *”»«  +  nc] 

As  both  [16]  and  [14]  show,  this  is  a  well-defined  parameterization. 


(2.13) 


*  * 


2.2  An  Asymptotically  Efficient  Algorithm 

Obviously,  the  accuracy  of  the  spectral  estimate  is  determined  by  the  accuracy  of 
the  parameter  estimate,  so  finding  the  parameter  estimator  with  the  lowest  possible 
variance  is  important  to  the  overall  accuracy. 

As  a  first  try,  we  could  consider  either  using  standard  unbiased  sample  covariance 
estimation: 

1  N-k 

fk=  N  -  k  ^  +  k )  i  =  (214) 

f-k  =  fk  (2.15) 

However,  the  fk  are  in  general  not  statistically  efficient  estimates,  and  can  in  fact 
have  very  bad  accuracy.  We  could  instead  consider  a  Maximum  Likelihood  (ML) 
autocorrelation  estimator:  this  will  be  statistically  efficient,  but  on  the  other  hand, 
will  also  be  computationally  expensive.  The  idea  considered  in  [16]  and  [14]  is  to 
improve  on  the  estimates  in  (2.14)  in  such  a  way  that  it  attains  some  of  the  desirable 
properties  of  ML;  specifically,  it  should  be  asymptotically  (  i.e.  for  N  — ♦  oo  ) 
statistically  efficient,  that  is, 

lim  NE{(R  -  R)(R  -  R)7} 

N-*oo 

should  be  minimal.  In  this  equation  R  is  the  vector 


for  some  integer  m,  and  R  is  the  estimate  of  R. 


V, 


l 


V 

V 


A 

>. 


As  we  axe  going  to  have  several  different  kinds  of  estimates,  the  following  general 
notation  is  used:  a  variable  with  a  tilde  (  ~  )  denotes  the  initial  estimate  of  that 
variable,  a  variable  with  a  circumflex  (  A  )  denotes  the  improved  estimate  of  that 
variable. 

2.3  The  Correction  Step  from  Yule  Walker  to  Approximate  Maximum 
Likelihood:  Heuristic  Presentation 

The  following  is  a  plausability  argument  for  the  correction  step  used  in  the 
algorithm:  a  more  formal  presentation  is  given  in  the  next  section. 

Consider  a  scalar  random  variable  8  with  unknown  expected  value  6.  We  would 
like  to  find  a  new  ran<  >m  variable  8  with  the  same  expected  value  8 ,  but  a  smaller 
variance.  Assume  that  we  also  have  a  random  variable  z  (  called  an  instrumental 
variable  )  with  expected  value  0  that  is  a  (  deterministic  )  function  of  8:  z  =  z(8)  and 
has  the  property  that  the  variance  of  z  increases  monotonically  with  the  variance 
of  8. 

The  above  problem  can  be  cast  into  the  language  of  linear  spaces.  Consider  the 
Hilbert  space  H  of  all  random  variables  with  zero  mean  and  finite  second  moment. 
The  addition  and  scalar  multiplication  operations  in  this  space  are  the  obvious  ones; 
the  inner  product  of  two  random  variables  x  and  y  (  denoted  <  x,y  >  )  is  defined 
to  be  E(xy)  (  [18]  shows  that  this  is  a  well-defined  inner  product  ).  Note  that  the 
norm  generated  by  the  inner  product  (  y/<  x,x  >  )  is  the  standard  deviation  of  the 
random  variable. 

We  have  two  vectors  2,  and  8  =f  8  -  8.  We  would  like  to  minimize  the  norm  of 
||0||,  but  are  hindered  by  the  fact  that  we  do  not  know  ||0j|  (  8  is  unknown  and  we 
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Figure  2.1:  Geometric  Intepretation  of  the  Effect  of  the  Instrumental  Variables  on 
the  Variance  of  the  Estimate. 

have  only  measurements  of  8  ).  However,  since  w  j  know  how  z  depends  on  6,  we 
can  estimate  the  inner  product  (  =  covariance  )  <  8,z  >,  and  standard  deviation 
y/<  z,  z  >  =  || z ||,  from  the  known  measurements  of  6  and  z.  This  gives  us  the 
strategy  of  subtracting  the  projection  on  the  subspace  spanned  by  z  from  6,  see 
Figure  2.1.  To  determine  the  projection  we  divide  z  by  its  norm  to  yield  et\  now 
the  projection  P  on  the  z-subspace  is  simply  the  inner  product  <  8,  e,  >  times  the 


vector  e,: 


P 8  =<  8 ,  e,  >  e. 


If  we  want  to  express  this  in  terms  of  z  instead  of  et,  we  get  a  factor  ||2||  in  the 
inner  product  (  <  8,  z  >=  ||z ||  <  8,et  >  ),  and  another  one  from  the  vector  at  the 


end  (  z  =  H-zlle*  ),  giving  a  total  of  ||z||2  =<  z,z  >;  this  has  to  be  divided  out  again: 


P9  = 


<  9,z  > 
- z 

<  z,z  > 


Finally,  we  need  9  —  P9: 


0-P9  =  9- 


<  9,  z  > 

<  z,z  >Z 


which  gives: 


0-9=9 -9- 


<9,z  > 

- z 

<  z,z  > 


(2.16) 


where  fortunately  the  9  terms  cancel  out,  leaving  us  only  with  terms  we  can  estimate 


from  the  given  data.  In  probabilistic  terms,  equation  (2.16)  is:  again: 


9  =  9- 


E  (9z) 
E  («)* 


For  vectors  this  generalizes  to: 

9  =  9-E{9zT)(E{zzT))-xz  (2.17) 


2.4  The  Correction  Step:  Overview  and  Implementation  Issues 

In  this  section  we  present  the  correction  step  without  proof;  some  proofs  are 
given  in  Appendix  A. 

The  general  strategy  for  the  correction  step  is  as  follow’s:  we  will  try  to  find  a 
random  vector  X ,  that  is  completely  determined  by  the  N  available  data  samples, 
and  such  that  for  N  —*  oc  the  distribution  of  X  can  be  completely  specified  from  9. 
Furthermore,  using  to  indicate  a  Gaussian  distribution  with  mean  vector 

m  and  covariance  matrix  £,  X  should  be  such  that: 

'/KIX  -  X)  JZl  A'lO.tV)  (2.18) 
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where 


e 

y 

0 

and  where  9  is  as  defined  in  (2.13).  The  covariance  matrix  W,  which  is  assumed  to 
be  nonsingular,  may  depend  on  9.  Finally,  assume  that  an  estimate  W  of  W  can 
be  calculated  from  the  available  data,  and  that  |TF  —  W\  is  0(\/\/~N)  l. 

It  is  shown  in  [16]  that  under  these  circumstances  we  have 


where 


W 


X 


Wn 

Wn 

tt'S 

w22 

1m  x  1 


(2.19) 


(2.20) 


and 


where  we  assume 


Wu 

=  E  (9-9)(9-9) 

T  nO  x  n9 

(2.21) 

W12 

=  E z{9  -  9)T 

n9  x  (m  —  nO) 

(2.22) 

w22 

=  E  zzT 

(m  —  n6)  x  (m  —  n9) 

(2.23) 

m  — 

dim  X 

n6  = 

dim  9 

m  >  na 

+  nc+  1 

0{e)'  means  the  standard  deviation  is  equal  to  /\  t,  where  K  is  independent  of  i. 


and  where  z  is  the  same  random  vector  discussed  in  the  previous  section.  (  z  is 
often  called  the  instrumental  variable  vector  or  the  instrumental  variables  ) 


B 

v. 

V. 


In  Appendix  A  we  show  that 

ft  =  9  -  WuW^z  (2.24) 

is  an  approximate  ML  estimate  of  9.  However,  this  is  in  general  not  implementable 
since  H'  may  depend  on  8  and  therefore  be  unknown.  However,  since  2  is  0(l/\/N ), 
we  can  replace  \VtJ  by  their  consistent  estimates  W without  affecting  the  order  of 
the  approximation  (  see  [16]  for  details  and  proof  ).  This  gives: 

6  =  6-  WltWgz  (2.25) 

It  remains  to  choose  the  instrumental  variable  vector  z :  we  take  z  equal  to: 

2  =  [Zl  ■  ■  ‘  Zm-ne\T 

na  na 

Zk  =  ^  ^  Qi^wc-f  na+k-i-jQ-i  (2.26) 

(=0 ;=0 

which  is  clearly  a  deterministic  function  of  9  through  (2.11).  As  [17]  discusses,  2 
asymptotically  has  zero  mean  and  normal  distribution,  so  it  fulfills  the  requirements 
of  the  previous  section.  This  specific  form  of  the  vector  2  leads  to  a  relatively  simple 
distribution  of  X]  it  was  introduced  by  Walker  [17],  who  proves  that  it  is  optimal 
in  the  sense  that  if  one  chooses  a  2  vector  of  length  n,  there  does  not  exist  any  zero 
mean  n-vector  that  is  more  correlated  with  8 ;  in  other  words,  this  choice  will  take 
out  the  maximum  amount  of  variance  from  the  estimate  9  (  see  Figure  2.1  ). 

As  is  shown  in  Appendix  A  of  [16],  this  2  achieves  the  properties  of  (2.18),  with 

[Hn]ij  =  E{[u(t  -  z)  +  v(t  +  i)]u(t  - »}  ,  i,j  =  0,. . .  ,n8  -  1  (2.27) 
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I 


,  J.-r. 


v.v 


(2.29) 


[H^ki  =  E{A*(q-')v(t  -  ~  i)) 

nc 

=  the  coefficient  of  z'~>  in  [  ^  ft**-*]2  ,  t,  j  =  1, . . .  ,  m  —  n6 


[W12]j*  =  E{A2(g  ^(i -f  na  +  nc  -  j)  +  u(<  +  na  +  nc  +  j]t;(t  -  fc)} 

=  otk+j  ,  k  =  -ri0 

a,  =  the  coefficient  of  zs  in  the  long  division  of 


(2.30) 


(2.31) 


Computation  of  some  of  the  elements  of  the  Wnm  matrices  can  be  avoided  by  noting 
that  they  are  always  equal  to  0,  see  [14]  for  details. 

It  is  shown  in  [16]  that  as  N  — ►  oo  the  covariance  matrix  of  the  normalized 
estimation  error  \/N (0  —  9)  is  given  by 


Pi  =  W„  -  WuWtfWS 


(2.32) 


furthermore  we  have 


pe  >  pe 

mi  —  *  mj 


for  m2  >  mi 


(2.33) 


From  equation  (2.33)  it  follows  that  (P^)m  has  a  limit  when  m  — ►  oo,  which  we  will 
denote  by  P^.  The  following  equality  holds 


Pi  =  wu  -  nnT  =  P°n , 


(2.34) 


Qt,  =  E{C2(g  1)[e(t-i)  +  e(t  +  0]  TT,  ITT e(*  ~  nc  ~  na  ~  ■?)}  (2-35) 


where 


However,  the  asymptotic  results  depend  on  the  fact  that  m  is  a  negligable  fraction 
of  Ar,  so  for  finite  data  lengths  we  must  be  careful  not  to  choose  m  too  large.  This 
makes  the  convergence  rate  of  (P^)m  to  P ^  important:  it  can  be  shown  [16]  that 
if  C(z )  does  not  have  roots  with  approximately  unit  magnitude,  the  convergence 
will  be  fast;  otherwise  it  will  be  slow,  and  we  will  need  to  use  large  values  of  m  in 
order  to  approach  P^,  which  is  not  possible  unless  we  have  a  very  long  data  record 
available  (  since  m/N  ~  0  must  hold  ). 

We  can  find  estimates  of  a,  by  substituting  our  estimate  8  for  the  autocorre¬ 
lations  in  the  Yule  Walker  equations  (2.11).  The  resulting  estimate  will  have  an 
asymptotic  (  for  N  — ♦  oo  )  covariance  matrix  which  can  be  shown  to  be  equal  to 


=R-\Qn-Q„W£QTn)R- 


(2.36) 


where 


[Qn\ij  =  E{4(9  l)v(t-i)A(q  l)v(t  -  j)} 

[Qn)i,  =  E{A2(q-')v(t  -  i)>i(g"1)u(t  -  na  -;')} 


(2.37) 

(2.38) 

(2.39) 


and  R  is  the  matrix  in  the  YW  equation  (2.11).  Again  we  have  [16] 


pa  ■>  pa 
■*  mi  —  m2 


,  for  m2  >  mj 


(2.40) 


Therefore,  (P^)m  must  have  a  limit  P£,  which  can  be  shown  to  be  equal  to  [16] 


p^  =  R-'(Q„-rrT)R-'  =PaCR 


(2.41) 


where 

rtJ  =  E{C2(£7_1)e(<  -  i)  e(t  -  na-j )}  *  =  1, . . . ,  na 

A(q  ) 

j  =  I,-  -  •  ,2 nc 

T  =  (r0)  (2.42) 

It  has  been  shown  [16]  that  the  covariance  matrices  (  equation  (2.34)  )  and 
P £,  (  equation  (2.41)  )  are  equal  to  the  CRLB  for  6  respectively  a,  that  is,  the 
algorithm  is  asymptotically  statistically  efficient.  Notice  however,  that  the  proofs 
[16]  assume  that  the  limits  m  — >  oo  and  N  — *  oo  are  taken  in  such  a  way  that 
m/N  — *  0,  so  when  implementing  the  algorithm,  m  must  be  a  negligible  fraction  of 
N. 

The  correction  step  could  of  course  be  repeated:  we  could  use  the  new  estimates 
to  get  improved  values  for  2  and  W,j  and  use  them  in  (2.25)  (  note  that  we  would 
still  use  the  original  autocorrelations  in  the  right  hand  side  since  the  correction  step 
is  designed  to  correct  the  original  estimates,  not  the  improved  ones  ).  Since  the 
algorithm  is  asymptotically  efficient,  this  will  not  improve  the  asymptotic  properties, 
but,  as  [16]  suggests,  it  could  improve  the  short  data  length  behaviour.  This  is 
discussed  in  section  3.3. 

As  is  shown  in  [14],  the  algorithm  is  well  conditioned  in  the  limits  m,N  -*  oo. 
The  paper  [14]  gives  bounds  for  the  numerical  condition  number  of  the  algorithm 
(  as  a  function  of  the  parameters  a  and  c  )  in  the  limiting  case. 


Table  2.1:  Summary  of  the  Algorithm 

1.  (a)  Use  (2.14)  to  find  f 

(b)  Use  (2.11)  with  r  in  place  of  r  to  find  a(  in  a  least  squares  sense  if  we 
choose  K  >  na  +  nc  ) 

(c)  Use  (2.12)  with  a  and  r  in  place  of  a  to  find  b 

2.  Use  f,  5,  b  to  find  f  using 

(a)  (2.26)  to  find  z 

(b)  (2.27)-(2.31)  to  find  Wu  and  W22 

( c )  (2.25)  to  find  8  from  z,  Wij  and  i 

3.  (a)  Use  (2.11)  with  r  in  place  of  r  to  find  a  (  again  possibly  in  a  least  squares 

sense  ) 

(b)  Use  (2.12)  with  a  and  r  in  place  of  a  and  r  to  find  b 

(c)  Use  a  and  b  in  (2.6)  to  obtain  the  spectral  estimate  $(e'7u/) 

2.5  Summary  of  the  Algorithm 

We  can  now  summarize  the  algorithm  see  Table  2.5  (  for  a  good  overview  of  the 
computational  issues  see  [14]  ). 

As  an  overview  of  the  properties  of  the  algorithm,  we  know  that  it  is  statistically 
efficient  when  both  TV  — ►  oo  and  m  — >  oc,  and  we  know  that  the  asymptotic 
covariance  matrix  of  both  a  and  8  is  monotonically  non-increasing  as  a  function  m, 
but  may  converge  very  slowly  if  C(z )  has  roots  near  the  unit  circle.  Furthermore, 


for  N  — ♦  oo  the  algorithm  is  numerically  well- conditioned. 

For  finite  data  lengths  we  know  that  we  need  m  N,  which  (  in  cases  of  slow 
convergence  )  may  prevent  us  from  getting  close  to  statistical  efficiency.  Moreover, 
the  algorithm  need  no  longer  be  well-conditioned.  Finally,  since  we  no  longer  have 
efficiency,  the  estimates  of  6 ,  a,  and  bj  may  have  different  statistical  properties  in 


3.  A  Simulation  Study  of  the  Finite-Data  Length  Behaviour  of  the 

Algorithm 


This  chapter  reports  on  the  behaviour  of  the  algorithm  for  finite  data-lengths, 
and  discusses  the  various  aspects  and  problems  that  are  seen. 

It  is  known  [16]  that  the  algorithm  is  both  statistically  efficient  and  numerically 
well- conditioned  in  the  limiting  case  N,m  —>  oo;  m/7V  — +  0.  For  finite  m  and 
N  — y  oo  we  know  [16]  the  behaviour  of  the  parameter  variances  as  a  function  of  m, 
and  for  processes  that  do  not  have  zeros  close  to  the  unit  circle  we  know  [16]  that 
even  for  “low”  values  of  m  these  variances  are  close  to  the  CRLB.  For  finite  m  we  do 
not  have  any  results  on  the  numerical  condition  of  the  algorithm.  For  both  N  and 
m  finite,  we  know  neither  the  numerical  condition  nor  the  statistical  efficiency  of 
the  algorithm.  In  this  chapter  we  investigate  the  behaviour  of  parameter  variances 
for  the  N ,  m  finite  case  through  Monte  Carlo  simulations. 

3.1  An  Overview  of  Finite-Data  Length  Effects  to  be  Investigated 

For  this  study  the  observed  behaviour  of  the  parameter  variances  will  depend 
not  only  on  the  particular  ARMA  process,  but  also  on  two  parameters  which  must 
be  chosen:  the  number  of  postiterations,  and  the  number  of  instrumental  variables 
(  i.e.  the  value  of  m  ). 

It  is  stated  in  the  previous  chapter  that  asymptotically,  one  postiteration  is 
enough  to  achieve  the  minimum  possible  variances.  However,  in  [16]  it  is  suggested 
that  for  short  data  lengths  multiple  postiterations  might  help  to  improve  the  sta¬ 
tistical  quality  of  the  estimate. 
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In  addition  we  know  that  as  N  —*  oo,  increasing  the  number  of  instrumental 
variables  monotonically  improves  the  quality  of  the  estimate.  However,  this  proof 
depends  on  the  fact  that  the  number  of  instruments  is  an  infinitesimal  fraction  of 
the  data  length;  therefore,  for  finite  data  lengths  we  can  expect  deviations  from  the 
theoritically  predicted  behaviour  if  we  choose  the  number  of  instruments  to  be  too 
large. 

In  order  to  get  a  good  insight  in  the  finite  data  length  behaviour  of  the  algorithm, 
we  need  to  use  several  data  lengths,  and  consider  several  processes,  of  different 
order,  and  try  to  find  out  how  the  variances  of  the  estimated  parameters  change  as 
a  function  of  the  chosen  data  length  for  each  process. 

A  specific  problem  we  may  encounter  is  autocorrelation  estimates  that  are  not 
nonnegative  definite  (NND),  or,  equivalently,  spectral  estimates  that  are  not  non¬ 
negative  (NN)  for  all  frequencies.  The  true  autocorrelation  sequences  are  always 
NND,  and  therefore  the  true  spectra  always  NN,  but  they  can  come  arbitrarily 
close  to  not  being  so,  in  which  case  the  uncertainty  in  the  estimates  may  cause  the 
estimated  autocorrelations  and  spectra  not  to  be  NND  and  NN,  respectively.  As  an 
example:  the  sequence  (e,0,0, ...)  is  NND  for  any  t  >  0,  but  is  not  NND  for  any 
e  <  0. 

In  section  4.2  a  strategy  for  repairing  this  problem  is  studied.  It  relics  on  the 
fact  that  an  autocorrelation  sequence  that  is  “close”  to  not  being  NND  will  have 
zeros  (  roots  of  the  C  polynomial  )  close  to  the  unit  circle;  therefore,  the  problem 
can  be  alleviated  by  moving  all  zero  estimates  which  are  too  close  to  the  unit  circle 
farther  away  from  it. 


3.2  The  Initial  Monte  Carlo  Simulations 


This  section  details  the  design  and  results  of  the  first  simulations. 


3.2.1  Design  of  the  Simulations 


The  following  processes  have  been  used  in  the  simulations  (  see  Table  3.1  ). 
Table  3.1  shows  that  the  processes  called  Example  1  and  Example  2  have  poles 
close  to  the  unit  circle,  but  no  zeros  close  to  the  unit  circle,  while  for  Examples  3 
and  4  the  reverse  situation  occurs.  Examples  1  and  3  have  been  taken  from  [6]  and 

[S]- 


We  expect  estimates  from  the  data  sets  for  Examples  1  and  2  to  behave  like 
the  asymptotic  case  (  N  — ►  oo  )  for  relatively  small  data  lengths,  and  estimates  for 
Examples  3  and  4  to  need  large  datas  lengths  for  asymptotic  behaviour  because  the 
analysis  of  the  algorithm  predicts  that  zeros  close  to  unit  magnitude  will  seriously 
affect  convergence  in  m  of  the  variances  to  their  asymptotic  values  (  [16]  ),  thus 
necessitating  large  N,  as  m/N  0  must  hold.  We  also  expect  that  Examples  3  and 
4  will  need  larger  data  lengths  even  for  fixed  m,  as  the  numerical  condition  of  the 
algorithm  also  is  affected  by  zeros  close  to  the  unit  circle. 

For  the  other  two  processes  (  Examples  3  and  4  )  wre  expect  the  algorithm  to 
have  a  slow  convergence,  a  higher  asymptotic  variance  and  less  numerical  stability, 
these  examples  also  have  varying  orders  (  two  are  ARMA(1,1)  processes,  one  is  an 
ARMA(4,3),  and  one  an  ARMA(4,4)  process  ).  Several  other  processes,  especially 
ARM A(  1,1)  cases,  were  looked  at,  but  are  not  reported  on  here,  as  the  results  from 
conclusions  from  those  simulations  support  the  conclusions  presented  here. 


Simulations  were  carried  out  using  N  =  75,  200,  750,  and  2000  data  points. 


Table  3.1:  The  ARM  A  Processes  Used  in  the  Simulations 


Example 

1 

2 

3 

B 

na 

4 

D 

4 

B 

TIC 

4 

D 

3 

1 

D 

2.8271 

i 

0.017258 

1 

ao 

1 

l 

1 

1 

a\ 

0.1 

-0.9 

-1.3136 

-0.5 

a2 

1.66 

1.4401 

03 

0.093 

-1.0919 

CL  4 

0.S649 

0.83527 

Co 

1 

1 

1 

1 

Cl 

0.0226 

-0.5 

0.1792 

-0.9 

C2 

0.8175 

0.8202 

C3 

0.00649 

0.2676 

c4 

0.07637 

poles 

-0.25  ±  £0.9314 

0.9 

-.1253  ±  tO.9159 

0.5 

0.2  ±  zO.9434 

0.7821  ±  i'0.6047 

zeros 

-0.00079  ±  i0.8425 

0.5 

0.0658  ±  i0.9256 

0.9 

-0.0034  ±  i'Q.328 

-0.3108 

Initially  one  postiteration  was  done  to  determine  a  good  range  for  the  number  of 
instruments.  This  number  was  between  1  ...  10.  The  effect  of  multiple  postiterations 
for  these  cases  was  then  investigated. 

The  results  of  the  simulations  are  characterized  by  two  kinds  of  plots:  plots  of 
the  estimated  spectrum  and  plots  of  an  a  (  coefficient  of  the  estimated  denominator 
polynomial  ),  b  (  coefficient  of  the  estimated  numerator  polynomial),  or  r  (  estimated 
autocorrelation  coefficient  )  parameter  versus  either  the  number  of  instruments  or 
the  number  of  postiterations. 

The  spectral  plots  shown  each  have  four  curves: 

•  The  mean  spectrum:  the  mean  of  all  estimates  in  a  given  Monte-Carlo  run. 

•  The  mean  plus  the  normalized  standard  deviation  and  mean  minus  normalized 
standard  deviation  of  the  spectral  estimates.  The  standard  deviation  of  the 
spectral  estimates  is  computed,  multiplied  by  the  square  root  of  the  data 
length  and  the  result  added  to,  respectively  subtracted  from  the  mean  of  the 
spectral  estimates. 

•  The  true  spectrum. 

The  plots  of  an  a,  b,  or  r  parameter  versus  nz  each  have  at  least  two  curves: 
the  normalized  variance  and  the  sum-squared  error  (  any  difference  between  these 
two  curves  indicates  bias  in  the  estimate  ),  and,  where  appropriate,  the  theoretical 
variance  and  the  CRLB,  also  normalized  with  the  the  data  length. 


3.2.2  Results  for  the  Four  Choosen  Examples  at  Several  Datalengths 


and  Different  Algorithm  Settings 

Below  we  present  results  from  simulations  of  the  four  example  processes.  The 
results  naturally  cluster  to  two  groups:  the  low  order  cases  (  Example  2  and  Example 
4  )  and  the  high  order  cases  (  Examples  1  and  3  ). 

Low  Order  Cases  (  Examples  2  and  4  ) 

The  resulting  spectral  estimates  for  example  2  confirm  this  behaviour.  For 
increasing  nz,  example  2  improves  significantly  over  the  pure  YW  estimate  nz  —  0; 
as  can  be  seen  from  Fig.  B.l,  where  we  see  the  envelope  formed  by  the  two  curves 
"mean  minus  standard  deviation”  and  “mean  plus  standard  deviation”  become 
narrower  as  nz  increases,  the  most  noticable  narrowing  occuring  as  nz  increases 
from  0  to  1  and  then  from  1  to  2.  Example  4  also  improves,  but  has  a  much  higher 
variance  (  the  envelope  remains  wide  ),  as  can  be  seen  from  Fig.  B.3.  Problems  also 
occur  with  the  spectral  estimate  being  non  positive  definite,  even  for  higher  values 
of  nz.  This  is  seen  in  the  ragged  shape  of  the  envelopes  (Fig.  B.3  ).  As  can  be 
seen  from  Fig.  B.4,  the  variances  are  better  at  750  data  points  than  at  200,  but  the 
improvement  is  small.  On  the  other  hand,  the  spectrum  of  example  2  is  still  being 
estimated  quite  well  at  75  points,  as  can  be  seen  from  Fig.  B.2.  When  looking  at 
these  Figures,  it  is  important  to  remember  that  the  variances  are  normalized  by  a 
factor  Ar,  so  for  instanace  “small  improvement”  between  200  and  750  datapoints 
for  Example  4  means  that  the  variances  are  decreasing  as  jj. 

Comparing  N  =  200  curves  with  the  same  nz  for  example  2  and  example  4  ( 
Figs.  B.l  and  B.3  ),  we  see  that  the  latter  curves  have  a  far  wider  envelope.  We 


also  note  the  ragged  shape  of  the  envelope  for  example  4:  this  typically  indicates 
that  some  of  the  Monte  Carlo  iterations  resulted  in  a  spectral  estimate  that  is  not 
positive  for  all  frequencies;  plots  of  selected  individual  estimates  (  not  reproduced 
here  )  confirm  this. 

Overall,  we  see  that: 

•  For  a  given  X  and  nz,  example  2  has  lower  variances  than  example  4. 

•  For  a  given  X  the  variance  decreases  as  nz  increases,  but  the  effect  is  much 
more  pronounced  for  example  2  than  for  example  4.  The  decrease  of  variance 
is  greatest  for  low  values  of  nz. 

•  For  a  given  nz  the  "ariance  decreases  as  N  increases,  rapidly  for  example  2, 
but  slowly  for  example  4. 

The  statement  in  the  first  paragraph  of  this  subsection  that  this  behaviour  can 
be  explained  from  the  zero  positions,  as  predicted  by  the  asymptotic  analysis,  is 
supported  by  simulation  results  for  several  other  first  order  processes,  with  different 
pole  and  zero  positions  (  results  not  reproduced  here  ),  which  clearly  show  that  the 
position  of  the  pole  does  not  influence  convergence,  but  the  closer  the  zero  is  to  the 
unit  circle,  the  slower  the  convergence  becomes. 

The  ARMA  parameter  estimates  do  not  achieve  the  same  quality  as  the  spectral 
estimate.  Not  only  does  example  4  have  a  very  slow  convergence  (  a  problem  that 
apparently  affects  the  ARMA  parameters  even  more  than  the  spectral  estimate  ), 
but  example  2  now  also  has  a  convergence  problem;  the  variances  go  down  at  first, 
and  then,  as  nz  is  increased  further,  they  start  increasing.  This  is  contrary  to  what 
the  spectra  show.  See  Figures  B.5-B.10.  Looking  at  the  plots  we  see  that  curves  for 


the  a  and  b  parameters  are  similar  in  shape.  For  fixed  N,  the  variance  goes  down 
rapidly  as  nz  increases  from  0,1  to  2.  We  also  note  that  the  algorithm  computes 
estimates  that  have  very  little  bias  since  the  variance  and  sum  squared  error  curves 
are  indistinguishable  (  there  axe  differences,  starting  in  the  5th  significant  digit  ). 
A  clear  almost  flat  stretch  can  be  seen  in  all  curves  in  the  nz  =  3, . . . ,  8  range, 
indicating  that  the  estimator  performance  is  insensitive  to  the  choice  of  nz  in  this 
range.  Finally,  the  rise  in  the  curves  for  high  nz  values  is  mainly  due  to  a  few 
outliers,  not  a  general  deterioration  of  the  estimates.  This  can  be  seen  from  a 
listing  the  20  highest  error  norms.  For  each  Monte  Carlo  iteration,  the  program 
computes 


and  at  the  end  of  a  run,  the  20  highest  values  are  returned.  For  a  first  order  process, 
typically  18  or  19  of  the  20  error  norms  have  a  value  of  as  0.5,  and  the  remaining  1 
or  2  having  a  value  of  %  17. 

The  curves  in  Figures  B.5-B.10  show  the  following: 

•  Just  as  for  the  spectral  estimates,  for  a  given  Ar,  the  parameter  variances 
decrease  as  nz  increases,  with  the  largest  decreases  occuring  for  nz  increasing 
from  0  to  1  and  from  1  to  2.  This  effect  is  stronger  in  example  2  than  in 
example  4. 

•  Just  as  for  the  spectra,  for  a  given  nr,  the  parameter  variance  decreases  as  N 
increases,  more  so  for  example  2  than  for  example  4.  This  can  be  observed 
by  comparing  curves  for  the  same  parameter  but  different  data  length  and 
reading  off  variances  for  the  same  nz.  See  Fig.  B.5. 


•  Bias  in  the  parameter  estimates  is  negligible. 

High  Order  Cases  (  Examples  1  and  3  ) 

The  results  for  example  1  are  similar  to  those  for  the  low  order  cases.  (  Figure 
B.12  )  If  we  compare  the  spectra  for  nz  —  0,1,2  we  see  a  rapid  improvement, 
which  becomes  a  slow  improvement  for  nz  >  3.  However,  unlike  example  2  and 
example  4,  we  see  here  that  there  is  improvement  for  nz  >  3;  it  is  much  slower 
than  for  the  lower  values  of  nz ,  but  comparing  nz  =  7  with  nz  =  3  for  Example  1 
shows  a  clear  improvement  B.12  For  nz  =  9, 10  we  see  the  the  envelope  wdden  again. 
Both  examples  show  the  ragged  envelopes  characteristic  of  non  positive  spectra,  but 
increasing  nz  clearly  decreases  the  effect,  as  can  be  seen  by  comparing  individual 
spectral  estimates. 

Comparing  different  data  lengths  (  Figures  B.11-B.17,  giving  spectral  plots  for 
N  =  75, 200,  750, 2000  for  Example  1  and  N  =  200,  750, 2000  for  Example  3;  for  both 
examples  results  for  nz  =  0, 1,2,3,  7, 10  are  given.  Example  3  does  not  have  plots 
for  N  =  75  because  they  are  so  bad  that  they  no  longer  show  any  information  )  we 
see  slow  convergence  of  the  variance  as  a  function  of  the  datalength,  and  variances 
that  are  clearly  worse  than  example  4  for  N  =  75:  we  obviously  need  longer  data 
records  for  these  processes. 

As  can  be  seen  from  Fig.  B.15,  increasing  nz  too  much  will  cause  the  variances  to 
increase:  the  Figure  compares  nz=7  and  10  for  2000  data  points,  and  the  envelope 
for  nz=  10  is  clearly  wider  than  the  envelope  for  nz=  7. 

Considering  example  3,  we  see  that  even  for  high  data  lengths  there  is  little 
improvement  in  the  variances  as  we  let  nz  increase  (  compare  for  example  nz  =  0 


or  3  for  2000  data  points  Fig.  B.17).  Again  this  shows  that  whereas  poles  close  to 
the  unit  circle  have  little  effect  (  example  1  has  poles  with  magnitude  greater  than 
0.95  ),  zeros  that  close  to  the  unit  circle  cause  the  decrease  of  the  variances  as  a 
function  of  nz  to  be  very  slow.  (  example  3  has  zeros  with  magnitude  greater  than 
0.95  ). 

The  ARMA  parameter  estimates  (  Figures  B.18  and  B.19  )  again  show  the 
pattern  of  rapidly  decreasing  variance  for  low  nz ,  followed  by  an  almost  flat  stretch, 
followed  by  a  very  rapid  rise:  this  rise  is  much  steeper  for  examples  1  and  3  than 
for  examples  2  and  4.  The  plotted  parameters  axe  typical  cases;  Figure  B.19  also 
shows  the  theoritical  variance  and  the  CRLB. 

3.3  Results  of  Simulations  with  the  Algorithm  Modified  to  Do  Multiple 
Postiterations 

One  proposed  method  for  improving  statistical  quality  for  small  N  is  to  use 
multiple  postiterations.  However,  it  turns  out  that  this  method  does  not  work: 
more  than  one  postiteration  increases  the  variances.  Our  simulations  gave  floating 
point  overflow  in  as  few  as  five  postiterations. 

Analysis  of  the  individual  estimates  of  several  Monte  Carlo  runs  showed  that  the 
problem  lies  in  the  high  sensitivity  of  the  estimated  a  and  b  parameters  to  outliers 
in  the  estimated  autocorrelations.  As  was  stated  above,  the  correction  step  of 
the  algorithm  introduces  outliers  (  typically  |-5%  of  the  Monte  Carlo  iterations  ), 
and  when  these  are  fed  back  into  the  algorithm  for  the  next  postiteration,  the 
resulting  estimate  poorer  than  the  original  estimate,  thus  causing  rapid  divergence 
as  a  function  of  the  number  of  postiterations. 


3.4  Summary 


Based  on  simulations,  it  was  shown  that  the  algorithm  shows  the  predicted  slow 
convergence  for  processes  whose  C  polynomials  have  zeros  close  to  the  unit  circle. 
The  requirement  that  m<A'  was  also  confirmed.  The  simulation  results  indicate 
that  even  m  =  0.05iV  may  to  be  “too  large”. 

Furthermore,  we  have  found  that  the  spectrum  is  less  sensitive  to  the  effects 
of  high  values  of  nz  than  the  ARMA  parameters.  On  the  other  hand,  for  higher 
process  orders,  the  spectral  estimates  continue  to  improve  for  higher  values  of  nz 
than  the  ARMA  estimates  do;  that  is,  while  the  ARMA  estimates  for  nz  —  4  and 
5,  say,  are  almost  identical,  the  spectral  estimate  for  nz  =  5  is  slightly  better  than 
that  for  nz  =  4. 

For  all  of  the  processes  considered,  most  of  the  improvement  of  the  algorithm 
over  the  normal  Yule  Walker  method  is  gained  through  the  addition  of  the  first  2 
instruments.  After  that,  improvements  with  additional  instruments  are  either  much 
smaller,  or  even  negligible. 

Also,  for  all  processes,  the  variances  of  the  estimated  parameters  increase  with 
larger  values  of  nz\  for  example  2  and  example  4  this  increase  starts  gradually,  but 
for  examples  1  and  3  it  is  very  sudden. 

One  method  for  preventing  the  increase  in  variance  for  high  nz  is  to  choose  a 
low  value  for  nz.  For  low  order  processes  this  seems  no  problem,  as  choices  of  2  or 
3  instruments  seems  to  be  all  that  is  needed  to  achieve  the  minimum  variance  the 
algorithm  can  realize.  For  high  order  proceses  on  the  other  hand,  and  especially  if 
one  is  mainly  interested  in  obtaining  accurate  spectral  estimates,  a  larger  number  of 
instruments  may  bo  needed,  leading  to  potential  problems  with  the  variance  because 


a  choice  that  is  too  high  can  result  in  a  variance  that  is  far  higher  than  even  than 
the  initial  Yule  Walker  estimate. 

A  second  improvement  method  might  be  to  use  extra  postiterations  for  lower 
datalengths.  However,  as  we  have  seen  above,  this  approach  has  problems  with 
variances  increasing  instead  of  decreasing  that  are  even  worse  than  the  problems  of 
the  original  algorithm. 

After  that  one  might  consider  various  “repair”  strategies,  perhaps  based  on 
outlier  elimination,  such  a  method  would  modify  “bad”  estimates  in  such  a  way  as 
to  numerically  stabilize  the  algorithm  before  the  correction  step  is  executed. 

However,  it  was  found  that  some  Monte  Carlo  iterations  are  much  more  robust 
than  others;  in  other  words,  if  we  could  change  the  algorithm  such  that  it  would 
decide  on  the  number  of  instruments  and  postiterations  for  each  individual  Monte 
Carlo  iteration,  performance  might  improve.  Such  a  method  can  also  potentially 
eliminate  the  problems  associated  with  using  a  large  number  of  instruments.  An¬ 
other  possibility  is  to  move  estimated  poles  and/or  zeros  that  are  close  to  the  unit 
circle  slightly  away  from  it,  so  as  to  reduce  the  numerical  problems  associated  with 
poles  and  zeros  close  to  the  unit  circle.  Finally,  the  strategies  that  have  been  used 
to  improve  Yule- Walker  methods,  such  as  using  overdetermined  Yule  Walker  equa¬ 
tions  (  i.e.  use  more  than  the  required  minimum  of  Yule  WTalker  equations  and  solve 
them  in  a  least  squares  sense  )  can  also  be  used.  While  using  overdetermined  Yule 
Walker  equations  does  not  guarantee  improvement,  extensive  simulations  [4],  [5] 
have  shown  that  improvement  is  generally  obtained  when  the  sequence  of  autocor¬ 
relations  decreases  slowly.  The  next  chapter  deals  with  these  strategies. 


4.  Investigation  of  Algorithm  Modifications  to  Improve  the  Small 

Data  Length  Behaviour 


In  this  chapter  we  consider  various  modifications  to  the  algorithm  to  improve 
its  small  data  length  performance.  First  we  consider  the  automatic  selection  of  the 
number  of  instruments  and  postiterations.  This  is  motivated  by  the  observation 
that  most  of  the  decrease  in  performance  for  high  nz  and/or  high  number  of  postit¬ 
erations  is  due  to  outliers,  therefore  selecting  an  appropriate  number  of  instruments 
and  postiterations  for  each  Monte  Carlo  iteration  separately  should  improve  per¬ 
formance.  Since  it  is  possible  to  make  the  algorithm  recursive  in  nz  in  a  highly 
computationally  efficient  manner,  automatic  selection  can  be  achieved  at  almost 
no  additional  computational  cost.  Second  we  consider  moving  poles  and/or  zeros 
which  are  close  to  the  unit  circle,  to  avoid  the  numerical  ill-conditioning  problems. 
Finally,  we  study  the  effect  of  using  overdetermined  Yule  Walker  equations  instead 
of  minimal  ones,  since  this  improves  the  performance  of  the  Yule  Walker  estimator 
in  many  cases  (  [4],  [5]  ). 

4.1  Automatic  Selection  of  the  Number  of  Instruments  and  Postitera¬ 
tions 

This  section  first  discusses  how  to  make  the  estimator  recursive  on  nz  so  that 
the  automatic  selection  strategy  can  be  implemented  efficiently;  then  it  discusses 
how  to  find  the  criteria  for  selecting  values  of  nz  and  the  number  of  postiterations 
i  in  a  self  adjusting  strategy.  Finally  it  compares  the  performance  of  this  strategy 
with  that  of  the  fixed  value  strategy. 


Making  the  algorithm  self-adjusting  in  the  nz  and  i  parameters  is  desirable  for 
the  following  reasons:  The  best  value  of  nz  is  dependent  on  the  process  being 
estimated.  There  are  two  ways  to  set  it  manually.  First,  one  could  use  a  priori 
knowledge  of  the  process  (  which  is  often  unrealistic  ).  Second,  one  could  tune 
the  algorithm  by  using  the  estimator  on  multiple  data  records  of  the  process  for 
several  values  of  nz  (  which  is  often  impractical  ).  A  choice  of  nz  could  then  be 
based  on  the  expirimental  variance  data  gathered  in  this  way.  A  self-search  would 
avoid  both  these  problems.  There  is  evidence  that  the  optimal  value  of  nz  varies 
considerably  even  for  different  realizations  of  the  same  process  (  by  as  much  as  a 
2-7  range  ),  making  separate  tuning  for  each  realization  attractive.  In  many  cases 
it  would  be  desirable,  or  even  necessary,  to  have  the  algorithm  do  this.  Due  to  the 
numerical  instabilities  described  in  Section  3.3  we  will  not  realize  any  advantage 
from  postiterations  beyond  the  first  one,  unless  we  use  only  those  cases  with  low 
error  norms  in  multiple  postiterations. 

4.1.1  A  Recursive  in  nz  Implementation 

The  algorithm  obtains  an  initial  estimate  (  steps  1.1  through  1.3  in  Table  2.5  ) 
and  then  corrects  it  by  first  correcting  the  f,  (  steps  2.1  through  2.3  )  and  then 
revising  the  original  ARMA  parameter  estimates  using  the  corrected  f,  (  steps 

3.1  and  3.2  ).  In  step  2.3  we  need  to  find  which  we  compute  by  solving 

W22WZ  =  2  for  wz.  It  is  in  the  second  stage  (  steps  2.1  through  2.3  )  that  the 
instrumental  variables  are  used.  It  turns  out  that  it  is  possible  to  carry  out  steps 

2.1  and  2.2  in  a  recursive  in  nz  manner,  computing  only  znz  at  step  2.1  in  each  stage 
(  the  formulae  for  z,  are  independent  of  nz  )  and  moving  step  2.2  out  of  the  loop  (  a 


is  independent  of  nz  ).  Finally  step  2.3  is  computed  using  a  version  of  the  algorithm 
described  in  [7]  rewritten  to  make  it  recursi%'e  in  the  size  of  the  matrix. 

The  equations  for  steps  2.1  and  2.2  are  unchanged,  the  correction  equation  9  — 
9  -  Wn  iV22l;z  is  unchanged,  but  it’s  calculation  is  now'  different.  It  is  implemented 
by  first  computing  wz  in  W22 wz  =  z  and  then  computing  9  from  9  =  9—  VTi 2wz, 
and  we  can  calculate  the  wz  in  a  recursive  way.  The  resulting  modified  algorithm 
is  detailed  in  Table  4.1 


4.1.2  Criteria  for  Choosing  nz  and  i 

We  need  some  criteria  to  use  for  choosing  nz  and  i.  Any  implementation  will 
have  finite  resources  (  in  terms  of  time  and/or  memory  )  available  for  the  estimator. 
Subject  to  these  constraints,  we  want  to  minimize  variance  of  the  parameter  or 
spectral  estimates,  or,  in  terms  of  a  single  realization,  minimize  an  error  norm. 
However,  the  error  norm  is  not  computable  by  the  estimator,  so  we  need  some 
function  of  the  estimate  that  provides  a  reliable  indication  of  the  error. 

Finding  the  minimum  possible  error  would  involve  checking  the  value  of  the  error 
criterion  for  all  allowed  nz  and  i  combinations;  this  is  not  practical  because  it  would 
require  a  large  amount  of  computations  to  implement.  A  more  computationally 
feasible  (  but  suboptimal  )  solution  is  first  to  step  nz  for  i  =  1,  until  some  desired 
value  of  nz  is  obtained  then  continue  to  use  this  value  of  nz  while  increasing  i  as  far 
as  possible  without  violating  an  error  criterion. 

It  was  experimentally  found  that  nearly  every  internal  quantity  became  much 
larger  if  the  number  of  postiterations  was  made  too  large.  As  a  result,  detection  of 
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Table  4.1:  The  Recursive  in  nz  Version  of  the  Algorithm 


1.  (a)  Use  (2.14)  to  find  f 


(b)  Use  (2.11)  with  r  in  place  of  r  to  find  a  (  in  a  least  squares  sense  if  we 


choose  K  >  na  +  nc  ) 


(c)  Use  (2.12)  with  a  and  r  in  place  of  a  to  find  b 


2.  Use  r,  a,  b  to  find  f  using 


(a)  Find  a  using  (2.31) 


(b)  Step  nz  from  1  to  the  desired  value  and  for  each  step  compute 


i.  (2.26)  to  find  z„ 


ii.  (2.27)-(2.31)  to  find  W{j 


(c)  For  the  final  value  of  nz  use  (2.25)  to  find  0  from  z  and  Wij 


3.  (a)  Use  (2.11)  with  f  in  place  of  r  to  find  a  (  again  possibly  in  a  least  squares 


sense  ) 


(b)  Use  (2.12)  with  a  and  f  in  place  of  a  to  find  b 


too  large  a  value  of  i  is  not  at  all  difficult.  We  have  used: 


- — ~~  <  1  -f  t)  and  i  <  max  i 
I  —  1 1  oo 

because  r  is  computed  early  in  the  postiteration  loop  (  step  2.1  in  Table  2.5  ),  so 
if  the  error  criterion  is  not  satisfied,  then  the  effort  invested  in  the  unsatisfactory 
postiteration  is  still  quite  small. 

This  heuristic  turns  out  to  be  quite  robust.  Regardless  of  how  nz  is  determined, 
or  how  the  random  number  generator  used  to  generate  the  data  sequence  is  initial¬ 
ized.  For  our  simulations 

T]  ~  0.004 
max  i  =  10 


was  a  choice  which  prevented  more  than  about  4%  of  the  Monte  Carlo  iterations 
from  going  up  to  max  i,  again  quite  independently  of  noise  seed  value  and  nz  heuris¬ 
tic. 


A  good  nz  heuristic  is  more  difficult  to  find  because  the  effect  of  a  wrong  choice 
of  nz  on  the  internal  variables  is  much  more  complex  than  in  the  case  of  a  wrong 
choice  of  i.  High  error  norms  are  associated  with  the  a  and  /?  vectors(  see  step 
2b  of  Table  2.5,  but  the  relation  is  so  complex  that  it  is  not  suitable  for  use  as  a 
criterion.  The  heuristic  used  is  based  on  the  norm  of  the  correction  vector  instead. 
The  reason  for  this  choice  is  that  a  large  correction  means  that  r  was  an  inaccurate 
estimate,  and  given  the  sensitivity  problems  in  the  correction  step,  this  probably 
means  the  correction  will  be  inaccurate.  So  the  heuristic  used  in  step  2.3  in  table 


2.5  is: 


<  c  and  nz  <  max  nz 


experimentally  we  found  that: 


e  ~  0.15 


max  nz  =  7 


were  good  choices.  However,  robustness  of  this  criterion  is  poor.  Although  the 
estimated  variances  are  fairly  insensitive  to  the  choice  of  max  nz,  changes  in  e  by 
5%  result  in  markedly  different  performance.  Moreover,  as  shown  below,  even  if  e  is 
chosen  to  minimize  the  variance  of  the  estimated  parameters,  the  resulting  variance 
is  higher  than  the  variance  realized  by  fixing  nz  to  a  “good”  value. 


4.1.3  Differences  Between  the  Fixed  and  the  Recursive  Approaches 

How  much  improvement  does  the  self- adjustment  provide?  Of  course,  in  answer¬ 
ing  this  question,  one  must  decide  what  is  meant  by  “improvement”:  for  example, 
is  improvement  sought  in  the  r,  estimates,  the  spectral  estimate  or  the  estimates  of 
the  ARMA  parameters  (  a,  and  b,  )? 

The  estimates  of  the  a's  and  b's  do  not  improve  with  a  self- searching  algorithm; 
while  the  sharp  rises  in  variance  for  high  nz  and/or  i  values  can  be  avoided  the 
resulting  optimal  variance  is  still  considerably  (  about  a  factor  1.5  to  4  )  larger 
than  optimal  hand  tuned  fixed  values  of  nz  and  i.  This  is  true  for  all  examples 
and  datalengths;  the  factor  1.5-4  cited  above  is  for  200  data  points.  This  factor 
decreases  as  the  data  length  increases,  and  extensive  simulations  confirm  this. 

The  r  estimates  do  improve  with  a  self  searching  strategy:  the  differences  are  in 
the  2nd  and  3rd  digits,  but  they  are  on  the  order  of  the  differences  between  different 
fixed  values,  so  this  is  a  large  relative  improvement.  Note  that  the  differences 
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Figure  4.1:  A  Histogram  of  nz  and  i  Combinations  chosen  by  the  Recursive  Algo¬ 
rithm  for  Example  1,  with  200  data  points. 
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between  r,  from  different  the  Monte  Carlo  simulations  typically  occur  in  the  2nd  to 
1st  digit. 

4.1.4  Conclusions 

The  self-adjusting  strategy  does  improve  the  rapid  rise  in  variance  for  high  nz 
that  was  typical  for  the  higher  order  examples  to  the  more  gradual  rise  observed 
in  the  lower  order  examples.  The  slow  improvements  in  the  spectral  estimates  for 
example  1  and  3  indicate  that  using  higher  nz  values  may  be  desirable.  However, 
the  self-adjusting  strategy  fails  to  achieve  the  same  low  variances  achieved  by  the 
original  algorithm  for  lower  values  of  nz.  Similarly,  the  self-adjusting  strategy  en¬ 
ables  us  to  use  more  than  one  postiteration,  but  the  resulting  variance  is  higher 
than  that  of  the  original  algorithm  with  one  postiteration.  Contrary  to  the  ARMA 
parameter  estimates  and  the  spectral  estimates,  the  autocorrelation  estimates  do 
improve  with  this  strategy. 

An  efficient  recursive  in  nz  implementation  of  the  algorithm  developed  as  a  part 
of  this  study  is  interesting  in  its  own  right,  and  useful  in  applications  where  multiple 
estimates  for  different  nz  are  desired. 

4.2  Pole  and  Zero  Adjustments 

Another  way  of  limiting  the  problem  of  high  variances  for  high  values  of  nz  is 
suggested  by  the  structure  of  the  algorithm;  as  shown  in  [16],  convergence  in  nz  is 
slow  if  the  process  has  zeros  close  to  the  unit  circle.  Furthermore,  calculation  of  the 
correction  step  involves  finding  the  first  few  elements  of  the  impulse  response  of  the 
process  with  transfer  function  (  see  step  2.2  in  Table  2.5);  one  would  expect 


numerical  problems  here  if  there  are  poles  close  to  the  unit  circle. 

To  alleviate  these  problems  one  can  consider  changing  the  algorithm  to  assure 
that  all  poles  and/or  zeros  are  sufficiently  far  from  the  unit  circle.  In  principle 
there  are  two  ways  of  doing  this:  one  can  move  only  the  offending  roots,  or  one  can 
scale  the  entire  polynoinial  (  which  moves  all  the  roots  ).  The  first  strategy  has  the 
advantage  that  it  introduces  the  minimal  change  in  the  polynomial  that  achieves 
the  objective.  The  second  strategy  has  the  advantage  that  it  does  not  introduce 
distortion  in  the  pole/zero  diagram,  but  only  scaling  of  the  entire  pole/zero  diagram. 
Also,  the  second  strategy  is  much  more  computationally  efficient.  As  a  second 
effect,  either  strategy  can  also  insure  that  the  numerator  polynomial  is  non-negative 
definite. 

Since  the^e  methods  tend  to  produce  |-2%  of  very  bad  outliers,  we  have  gathered 
statistics  both  with  and  without  the  outliers  taken  into  account.  Unless  stated 
otherwise,  the  statistics  presented  below  are  those  without  the  outliers  taken  into 
account. 

4.2.1  Individual  Pole  and  Zero  Movement 

As  discussed  above,  we  can  move  either  only  the  poles  or  only  the  zeros,  or  both. 
These  two  cases  are  discussed  below. 

Movement  of  Both  Poles  and  Zeros 

The  revised  algorithm  works  as  follows:  after  computation  of  a  and  6,  we  cal¬ 
culate  new  variables  a  and  6,  the  “moved"  versions  of  a  and  b ,  respectively.  The 
algorithm  then  proceeds  using  a  and  b  in  steps  2.1  through  2.3  of  table  2.5.  If  there 


are  multiple  postiterations,  the  same  is  done  with  a  and  b. 

The  a  parameters  are  computed  by  solving  for  the  roots  of  A,  checking  how  close 
each  root  is  to  the  unit  circle,  and  moving  any  offending  roots  radially  outward  to 
a  circle  with  radius  1  +  <5  for  some  chosen  6.  The  alteration  of  the  b  coefficients  is 
slightly  more  involved,  as  the  B  polynomial  contains  both  the  roots  of  C(z)  and 
C(z_1)  (  see  equation  (2.4)  ):  the  former  have  to  be  moved  out,  the  latter  in.  Also, 
any  roots  with  norm  equal  to  1  plus  or  minus  machine  precision  have  to  be  paired 
and  then  mo%'ed,  one  out,  one  in.  Pairing  of  a  root  z,  involves  finding  the  closest 
root  zj,  redefining  both  roots  to  be  the  mean  Z-~L,  and  then  moving  the  one  radially 
out,  the  other  radially  in. 

Comparison  of  the  revised  algorithm  with  the  original  shows  that  it  does  not 
have  the  sharply  rising  variances  for  high  nz\  in  fact,  for  6  «  0.001,  the  curve 
is  almost  flat.  This  elimination  of  the  problem  with  the  rising  variances  is  quite 
insensitive  to  the  exact  choice  of  6:  one  needs  a  variation  on  the  order  of  6  =  0.0003- 
0.003  to  make  an  appreciable  difference.  However,  we  observe  reduced  performance 
for  the  lower  values  of  nz,  varying  from  almost  no  change  for  the  r’s  to  between 
109c  and  a  factor  of  2  for  the  a’s  and  6’s.  This  is  distributed  over  the  parameters 
in  an  irregular  way,  suggesting  that  the  problem  may  be  more  pole-oriented  than 
parameter  oriented. 

Pole-Only  Movement 

Comparison  shows  that,  as  expected,  most  of  the  effects  (  both  the  beneficial 
and  the  detrimental  )  come  the  pole  movements,  so  we  have  tried  moving  only 
the  poles.  The  results  are  almost  as  good  as  the  combined  pole/zero  movement 


case,  and,  of  course,  the  computational  load  is  reduced  considerably.  This  implies 
that  non-negative  numerator  estimates  do  not  make  a  major  contribution  to  the 
variance.  The  above  statements  are  backed  by  extensive  simulations  performed  on 
all  four  examples. 

4.2.2  Scaling  Movement  of  All  Poles  and  Zeros 

In  this  section  we  consider  another  modification  in  which  all  poles  are  moved 
simulataneously.  Since  it  has  been  found  that  the  location  of  the  estimated  zeros 
hardly  influence  the  results,  they  are  not  moved.  The  algorithm  is  modified  in  the 
same  way  as  before,  except  that  a  and  b  (  6^6,  see  below  )  are  now  computed  in  a 
different  way.  The  parameters  a  are  now  found  by  checking  the  roots  of  .4,  and,  if 
any  are  too  close  to  the  unit  circle,  scaling  the  entire  polynomial,  in  effect  moving 
all  roots  radially  out  by  the  same  factor.  B  is  then  scaled  by  the  same  factor;  this 
was  thought  to  have  the  effect  of  keeping  the  residues  of  the  poles  approximately 
constant,  i.e.,  if  we  were  to  do  a  partial  fraction  expansion  on  B /(A(z~l)A(z))  we 
would  obtain  approximately  the  same  constants  in  the  numerators  as  in  the  partial 
fraction  expansion  of  B /(A(z~l)A(z)).  Since  the  method  did  not  have  solve  the 
problems  of  the  original  algorithm,  a  formal  proof  of  the  above  claim  about  the 
residues  was  abandoned.  Obviously  a  much  more  computationally  efficient  method 
would  be  to  use  a  stability  test  on  the  polynomial  and  then  scale  by  the  maximum 
factor  if  it  turns  out  not  to  be  stable  enough,  but  this  method  may  move  roots  by  a 
greater  extent  than  necessary.  We  want  to  try  the  maximum  possible  performance 
method  first,  to  see  how  much  improvement  this  approach  can  maximally  give. 

It  has  been  found  that  this  method  results  in  lower  variances  than  the  previous 


two.  See  the  a]  versus  nz  plot  (  Figure  B.21  )  for  a  typical  result;  for  comparison, 
the  curves  from  the  standard  algorithm  have  also  been  plotted  in  the  Figure.  As 
this  Figure  shows,  the  variances  of  the  revised  algorithm  are  still  above  those  for 
the  original  algorithm  for  nz  greater  than  1;  this  is  typical  for  all  parameters  in 
all  examples.  As  can  be  seen  from  the  spectral  plot  Fig.  B.20,  the  spectral  plots 
become  more  smooth  than  those  in  Fig.  B.12.  Further  improvement  can  be  gained 
by  using  the  tilde  polynomials,  i.e.  the  uncorrected.  a  and  b  for  the  computation 
of  the  z  vector.  This  makes  the  method  even  more  insensitive  to  changes  in  the  6 
radius,  which  is  an  advantage,  as  it  reduces  the  need  for  tuning. 

On  the  basis  of  the  experience  that  the  matrix  inversion  is  not  ill-conditioned 
for  any  of  our  examples,  we  could  modify  only  the  IF] 2  matrix,  i.e.  the  a  sequence. 
This  again  improves  performance.  These  changes  are  based  on  an  analysis  of  the 
effects  of  the  scaling  to  be  detailed  below.  The  principle  used  is  to  reduce  the  effects 
of  the  pole  movements  on  the  final  estimate. 

Since  the  correction  in  the  B  polynomial  to  keep  the  residues  of  the  poles  con¬ 
stant  is  only  an  approximation,  the  reduction  of  the  effects  of  the  pole  movements 
cannot  be  total,  but  the  correction  improves  performance  to  a  level  very  close  to  the 
original  algorithm  for  low  nz  values,  while  only  deteriorating  gradually  for  higher 
values.  The  main  problem  with  this  method  is  that  the  pole  correction  step  pro¬ 
duces  outliers:  about  0.5-29c  of  the  Monte  Carlo  iterations  have  an  error  that  is 
far  higher  than  the  rest.  The  above  results  are  based  on  statistics  from  which  the 
outliers  have  been  removed;  with  the  outliers,  the  statistics  are  considerably  worse, 
so  much  worse,  in  fact,  that  they  are  worse  than  the  original  algorithm.  The  outliers 
are  detected  by  keeping  track  of  the  20  iterations  with  the  worst  error  norms  and 


reporting  those,  then  a  limit  is  set  manually,  after  which  the  program  accumulates 
both  normal  and  no-outlier  statistics  (  i.e.  everything  worse  than  the  manual  limit 
is  ignored  ).  The  error  norm  is 


y,  ai  -  Q,  bj  -  a, 

ho  a'  ho  k 

There  is  a  considerable  gap  (  almost  a  factor  2  )  between  “normal”  and  “outlier” 
values  of  this  norm;  the  deterioration  for  high  values  of  nz  is  accompanied  (  and 
partly  caused  by  )  an  increase  of  the  number  of  outliers. 

Our  attempts  to  remove  outliers  on-line  have  failed,  so  there  is  currently  no 
way  of  exploiting  the  fact  that  the  increase  of  the  variances  is  mainly  due  to  a  few 
outliers.  On  the  other  hand,  the  no-outlier  statistics  represent  a  behaviour  that 
is  typical  for  for  98%  or  more  of  the  iterations  in  any  Monte  Carlo  simulation. 
However,  it  does  mean  that  overal,  the  original  algorithm  works  better  the  modified 
one. 

Removal  of  outliers  from  the  original  method  has  also  been  studied,  but  it  has 
far  less  effect  than  in  the  case  of  the  modified  algorithms,  so  much  less  in  fact, 
that  comparisons  between  the  original  algorithm  with  or  without  its  outliers  show 
almost  no  change. 


4.3  Using  High  Order  Yule- Walker  Equations 

As  has  been  established  through  extensive  simulations  (  [4],  [5]  ),  using  high 
order  (  also  called  ovcrdctermined  )  Yule  Walker  equations  usually  leads  to  reduced 
variances  in  the  estimates,  except  when  the  autocorrelation  sequence  decreases  very 
rapidly.  Therefore  we  should  try  what  happens  if  we  use  High  Order  Yule-Walker 
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equations  (HOYW)  in  the  algorithm  under  study.  Specifically,  in  Table  2.5,  we 
choose  K  >  na  +  nc  in  steps  lb  and  3a. 

Figure  B.22  shows  the  behaviour  of  the  a i  parameter  of  Example  1,  as  can  be 
seen,  the  variances  are  down  dramatically  from  the  ordinary  Yule- Walker  method, 
the  sharp  rise  in  variances  around  nz  =  10  is  replaced  by  a  more  gentle  one,  but  the 
improvements  are  largely  gene  as  well:  the  curve  is  almost  flat  now.  Unfortunately, 
there  did  not  remain  enough  time  to  investigate  these  phenomena  in  detail. 


5.  Conclusions 


We  have  seen  that  the  algorithm  proposed  in  [16]  achieves  high  quality  esti¬ 
mates  for  large  data  lengths.  For  short  datalengths  there  is  a  problem  associated 
with  the  number  of  instrumental  variables:  if  this  number  is  chosen  larger  than  ap¬ 
proximately  5%  of  the  datalength,  the  variances  increase  very  rapidly  as  a  function 
of  the  number  of  instrumental  variables.  This  effect  becomes  more  pronounced  for 
processes  as  the  process  order  increases.  Most  of  the  statistical  improvement  of  the 
algorithm  over  the  Yule  Walker  estimator  is  gained  by  the  addition  of  the  first  2  in¬ 
strumental  variables;  after  that,  additional  instruments  result  in  small  (  in  the  case 
of  spectral  estimates  )  or  negligible  (  in  the  case  of  ARMA  coefficient  estimates  ) 
improvements.  If  the  number  of  instrumental  variables  is  in  the  range  3-0.05./V  the 
variances  are  relatively  insensitive  to  the  exact  number  of  instruments.  The  sug¬ 
gested  method  [16]  of  improving  the  algorithm’s  performance  for  short  datalengths 
by  repeating  the  correction  step  multiple  times  does  not  work  for  the  cases  tried, 
as  it  results  in  rapidly  (  as  a  function  of  the  number  of  postiterations  )  increasing 
variances  of  the  estimates. 

Using  a  self  adjusting  version  of  the  algorithm,  where  each  realization  searches 
for  best  number  of  instrumental  variables  and  postiterations,  avoids  the  problem 
of  finding  proper  values  for  the  number  of  instruments  and  postiterations,  changes 
the  very  rapid  rises  in  variance  for  high  nz  to  a  more  gradual  one,  but  increases 
the  variance  of  the  ARMA  parameter  estimates  for  lower  numbers  of  instruments 
as  compared  to  the  original  algorithm. 

Using  a  stabilized  version  of  the  estimated  autoregressive  parameters  in  some 
steps  of  the  algorithm  and  approximately  compensating  the  changes  this  causes  in 


the  partial  fraction  expansion  of  the  spectrum,  also  is  able  to  remove  the  rapid 
variance  increases  for  large  numbers  of  instruments  in  about  98%  of  the  cases,  and 
has  only  slightly  worse  variances  for  lower  values  of  nz ,  but  suffers  from  0.5%  to  2% 
outliers. 

Using  overdetermined  Yule  Walker  equations  improves  the  performance  of  the 
estimator,  for  the  cases  tried,  but  this  work  is  very  incomplete. 

In  summary  the  original  algorithm  works  very  well  for  long  data  records,  but 
great  care  has  to  be  used  when  applying  it  to  short  records.  In  particular,  it  seems 
safest  to  not  to  use  more  than  2  instruments  and  only  one  postiteration.  Processes 
with  high  order,  or  zeros  close  to  the  unit  circle  should  not  be  estimated  from  short 
data  lengths  (  shorter  than  least  several  hundred  data  points  ).  In  many  applications 
this  will  mean  that  the  algorithm  is  not  suitable  for  estimating  the  parameters  of 
such  processes.  One  can  try  to  use  one  of  the  variant  algorithms  developed  in  chapter 
4.  If  possible,  one  should  estimate  multiple  records  and  compute  experimental 
variances:  these  clearly  show  for  what  values  of  nz,  i  and  N  the  algorithm  achieves 
good  statistical  performance. 


A.  The  Correction  Step:  Mathematics 


In  this  appendix  we  give  a  derivation  of  the  algorithm;  it  is,  of  course,  not  a 
replacement  for  [16]. 

Let  X  be  as  defined  in  (2.18)  and  W  as  in  equations  (2.19)-(2.23),  and  assume 
that  an  estimate  W  of  IV,  such  that  | W  —  W \  =  0(1/ \/~N)  can  be  calculated  from 
the  available  data.  Since  we  are  dealing  with  the  asymptotic  case,  (2.18)  is  not  too 
restrictive,  since  in  many  cases  a  central  limit  theorem  will  apply. 

The  asymptotic  log-likelihood  function  of  X  is  given  by  [9] 


m  =  -f  ln27r  —  i  In \W\  -  j(X  -  X)TW~\X  -  X) 


(A.l) 


The  ML  estimate  is  therefore  given  by  solving  for  a  9  such  that 

dm  n 


Differentiating  (A.l)  and  setting  the  derivative  to  zero  gives 


(X  -  X)T^(X  -  X) 


-  \  A  In  I  W\  +  0) W-'(X-X)-^ 


(X  -  X)T®£(X  -  X) 


=  0  (A. 2) 


Let  us  assume  that  (A. 2)  has  a  solution,  9,  with  respect  to  9.  Under  certain  condi¬ 
tions  the  estimate  will  be  consistent  and  \9  —  9\  =  0(1/ \/N).  Solving  this  equation 
is  a  computationally  difficult  problem;  however  we  can  make  an  0(1/Ar)  approxima¬ 
tion  without  affecting  the  asymptotic  behaviour  of  the  estimator.  For  large  enough 
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N  we  have  from  (2.18): 
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This  allows  us  to  rewrite  (A. 2)  as 
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Now  we  partition  X  and  W  as  indicated  in  (2.19)-(2.23),  and  use  a  standard  result 


on  the  inverse  of  partitioned  matrices  to  get 


W-1  =  W22  [07]  + 
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12^22 


(A.  5) 


Together  with  (A. 4)  this  gives 


fi  =  *  -  W12W22lz 


(A.  6) 


as  an  order  1/N  approximation  of  the  ML  estimate  of  9.  This  approximation  is 


generally  not  implementable  because  Wij  are  unknown  (  data  dependent  ).  Know¬ 
ing  that  z  is  0(1  /wN)  though,  we  can  replace  W \j  by  their  consistent  estimates 


without  changing  the  order  of  the  approximation: 


9  =  x-  W,AV0-'z 


which  is  (2.25). 
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B.  Plots 
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Figure  B.C:  Variance  and  sum  squared  error  of  b0  versus  nz  for  Example  2,  with 


Variance  and  sum  squared  error  of  b i  versus  nz  for  Example 
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Figure  B.14:  Mean  and  standard  deviation  of  spectral  estimates  for  Example  1, 
with  2000  data  points,  nz=0-7 
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Figure  B.20:  Mean  and  standard  deviation  of  spectral  estimates  for  Example  1 
with  200  data  points,  nz=0,l,3,7  with  the  poles  moved 


variance 


References 


[1]  Bellanger,  M.G.  “New  Applications  of  Digital  Signal  Processing  in  Communi¬ 
cations ”,  IEEE  ASSP  Magazine,  3,  3,  6-11,  july  1986 

[2]  Berkhout,  A.J.  “ The  Seismic  Method  in  the  Search  for  Oil  and  Gas:  Current 
Techniques  and  Future  developments” ,  Proc.  IEEE  74,  8,  1133-1159,  august 
19S6 

[3]  Box,  G.  and  G.  Jenkins  “Time  Series  Analysis:  Forecasting  and  Control ”,  San 
Fransisco:  Holden-Day,  1976 

[4]  Cadzow,  J.A.  “High  Performance  Spectral  Estimation  —  A  New  ARMA 
method”,  IEEE  Transactions  on  Acoustics,  Speech  and  Signal  Processing, 
ASSP- 28,  5,  524-529,  October  1980 

[5]  Cadzow,  J.A.  “Spectral  Estimation:  an  Overdetermined  Rational  Model  Equa¬ 
tion  Approach” ,  Proc.  IEEE,  70,  9,  975-989,  September  1982 

[6]  Cadzow,  J.A.  “ Noise  Compensation  for  Auto  Regressive  Model  Identification” , 
IEEE  Transactions  on  Automatic  Control,  AC-19,  716-723,  December  1974 

[7]  Dickinson,  B.W.  “Efficient  Solution  of  Linear  Equations  with  Banded  Toeplitz 
Matrices”,  IEEE  Trans.  ASSP,  ASSP-27,  4,  421-423,  August  1979 

[8]  Friedlander,  B.  and  B.  Porat  “A  General  Lower  Bound  for  Parametric  Spectral 
Estimation” ,  IEEE  Trans.  ASSP,  ASSP-32,  4,  728-733,  August  1984 

[9]  Haykin,  S.  “Nonlinear  Methods  of  Spectral  Analysis” ,  Springer-Verlag,  Berlin, 


[10]  Jayant,  N.S.  “Coding  Speech  at  Low  Bit  Rates”,  IEEE  Spectrum  23,  8,  58-63, 
August  19S6 

[11]  Kaveh,  M.  “High  Resolution  Spectral  Estimation  for  Noisy  Signals”,  IEEE 
Trans  ASSP,  ASSP-27,  3,  286-287,  June  1979 

[12]  Makhoul,  J.  “Linear  Prediction:  A  Tutorial  Review”,  Proc.  IEEE  63,  4,  561- 
5S0,  April  1975 

[13]  Mendel,  J.M.  “Some  Modeling  Problems  in  Reflection  Seismology”,  IEEE 
ASSP  Magazine,  3,  2,  4-17,  april  1986 

[14]  Moses,  R.,  P.  Stoica,  B.  Friedlander,  T.  Soderstrom  “An  Efficient  Linear 
Method  for  ARM  A  Spectral  Estimation” ,  To  Appear 

[15]  O'Shaughnessy,  D.  “Speaker  Recognition  IEEE  ASSP  Magazine,  3,  4,  4-17, 
October  1986 

[16]  Stoica,  P.,  B.  Friedlander,  T.  Soderstrom  “An  Approximate  Maximum  Likeli¬ 
hood  Approach  to  ARM  A  Spectral  Estimation  ”,  SCT  Technical  Report  5498- 
03,  February  1985 

[17]  Walker,  A.M.  “Large  Sample  Estimation  of  Parameters  for  Autoregressive  Pro¬ 
cesses  with  Moving  Average  Residuals” ,  Biometrika,  vol.  49,  pp.  117-131,  1962. 

[18]  Wessels,  J.  “Stochastische  Processen  II” ,  THE-syllabus,  edited  by  drs.  W.H.M. 
Zijm  Eindhoven  University  of  Technology,  1978  (  in  Dutch  ) 


